vignettes/NOAA Earthquakes Database.md

title: "NOAA's Earthquakes Database" author: "FG" date: "8/7/2017" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Vignette Title} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}

This package provides functions and geometries to manipulate and visualize in a clean and comprehensive way the information related to earthquakes from the NOAA' database.

This package was built as fulfillment project to the "Mastering Software Development in R Capstone" course from Coursera.org on July 2017.

 

Process Data

The first step is to get the data set and clean it for further use. Earthquakes data can be downloaded from NOAA' web page.

rawdata <- readr::read_delim("../data/signif.txt.tsv", delim = "\t", progress = FALSE)
colnames(rawdata)
##  [1] "I_D"                               
##  [2] "FLAG_TSUNAMI"                      
##  [3] "YEAR"                              
##  [4] "MONTH"                             
##  [5] "DAY"                               
##  [6] "HOUR"                              
##  [7] "MINUTE"                            
##  [8] "SECOND"                            
##  [9] "FOCAL_DEPTH"                       
## [10] "EQ_PRIMARY"                        
## [11] "EQ_MAG_MW"                         
## [12] "EQ_MAG_MS"                         
## [13] "EQ_MAG_MB"                         
## [14] "EQ_MAG_ML"                         
## [15] "EQ_MAG_MFA"                        
## [16] "EQ_MAG_UNK"                        
## [17] "INTENSITY"                         
## [18] "COUNTRY"                           
## [19] "STATE"                             
## [20] "LOCATION_NAME"                     
## [21] "LATITUDE"                          
## [22] "LONGITUDE"                         
## [23] "REGION_CODE"                       
## [24] "DEATHS"                            
## [25] "DEATHS_DESCRIPTION"                
## [26] "MISSING"                           
## [27] "MISSING_DESCRIPTION"               
## [28] "INJURIES"                          
## [29] "INJURIES_DESCRIPTION"              
## [30] "DAMAGE_MILLIONS_DOLLARS"           
## [31] "DAMAGE_DESCRIPTION"                
## [32] "HOUSES_DESTROYED"                  
## [33] "HOUSES_DESTROYED_DESCRIPTION"      
## [34] "HOUSES_DAMAGED"                    
## [35] "HOUSES_DAMAGED_DESCRIPTION"        
## [36] "TOTAL_DEATHS"                      
## [37] "TOTAL_DEATHS_DESCRIPTION"          
## [38] "TOTAL_MISSING"                     
## [39] "TOTAL_MISSING_DESCRIPTION"         
## [40] "TOTAL_INJURIES"                    
## [41] "TOTAL_INJURIES_DESCRIPTION"        
## [42] "TOTAL_DAMAGE_MILLIONS_DOLLARS"     
## [43] "TOTAL_DAMAGE_DESCRIPTION"          
## [44] "TOTAL_HOUSES_DESTROYED"            
## [45] "TOTAL_HOUSES_DESTROYED_DESCRIPTION"
## [46] "TOTAL_HOUSES_DAMAGED"              
## [47] "TOTAL_HOUSES_DAMAGED_DESCRIPTION"
data <- eq_clean_data(rawdata)

The function eq_clean_data create a Date field and make sure that relevant variables are in numeric format. This function also call eq_location_clean to clean the location name of where the earthquake occurred. An example is as follow:

eq_location_clean("JORDAN: BAB-A-DARAA,AL-KARAK")
## [1] "Bab-a-Daraa,al-Karak"

 

Earthquake Timeline

A convenient way of visualizing the earthquakes is by plotting a timeline with a point for each earthquake, using the ggplot2 geometry geom_timeline(). The size and color of the points can be assigned to other properties: by default size represent the earthquake magnitude and color the number of deaths. Each line can be associated to a category through the aesthetic y. By default the category is Country.

The additional geometry geom_timeline_label() adds a vertical line and a label to identify the earthquakes. The parameter n_max defines how many label are plotted, to avoid overplotting.

A dedicated theme theme_timeline() has been designed to improve the visualization. It is a modification of the theme_classic().

data %>%
  dplyr::filter(COUNTRY == "USA" | COUNTRY == "MEXICO",
                YEAR > 2000) %>%
  ggplot2::ggplot(ggplot2::aes(x = DATE, y = COUNTRY)) +
    geom_timeline(ggplot2::aes(size = EQ_PRIMARY, color = DEATHS), fill = NA) +
    geom_timeline_label(ggplot2::aes(label = LOCATION_NAME, size = EQ_PRIMARY), n_max = 5) +
    theme_timeline() +
    ggplot2::labs(size = "Richter Scale value:", color = "# of Deaths:")

geom_timeline plot

 

Earthquakes Map

Another way of visualizing the earthquakes is by use of an interactive map, obtained with the leaflet package. The function eq_map plot a map and a point for each earthquake. The size of the point is proportional to the earthquake magnitude. Earthquakes with no magnitude information, which is often the case for old earthquakes, are plotted, but with a small grey dot. Additional information can be displayed in a pop up window.

data %>%
  dplyr::filter(COUNTRY == "MEXICO" & lubridate::year(DATE) >= 2000) %>%
  eq_map(annot_col = "DATE")

A further improvement is provided by the function eq_create_label which built the HTML code to be shown in the pop up, by combining date, location, magnitude and number of deaths. It can be called as follows:

data %>%
  dplyr::filter(COUNTRY == "ITALY" & lubridate::year(DATE) >= 1000) %>%
  dplyr::mutate(popup_text = eq_create_label(.)) %>%
  eq_map(annot_col = "popup_text")

 

This package is released under MIT license. Check the license file for details.



frenkg/coursera.eq documentation built on May 12, 2019, 1:04 p.m.